Picture for Xianzhi Yu

Xianzhi Yu

Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

Add code
May 07, 2025
Viaarxiv icon

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Add code
Apr 07, 2025
Viaarxiv icon

AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference

Add code
Feb 06, 2025
Viaarxiv icon

CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference

Add code
Feb 06, 2025
Viaarxiv icon

FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers

Add code
Nov 21, 2024
Figure 1 for FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
Figure 2 for FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
Figure 3 for FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
Figure 4 for FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
Viaarxiv icon

FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs

Add code
Oct 22, 2024
Viaarxiv icon

FlatQuant: Flatness Matters for LLM Quantization

Add code
Oct 12, 2024
Figure 1 for FlatQuant: Flatness Matters for LLM Quantization
Figure 2 for FlatQuant: Flatness Matters for LLM Quantization
Figure 3 for FlatQuant: Flatness Matters for LLM Quantization
Figure 4 for FlatQuant: Flatness Matters for LLM Quantization
Viaarxiv icon

Pinpointing the Memory Behaviors of DNN Training

Add code
Apr 01, 2021
Figure 1 for Pinpointing the Memory Behaviors of DNN Training
Figure 2 for Pinpointing the Memory Behaviors of DNN Training
Figure 3 for Pinpointing the Memory Behaviors of DNN Training
Figure 4 for Pinpointing the Memory Behaviors of DNN Training
Viaarxiv icon